AITopics | pre-training algorithm

Collaborating Authors

pre-training algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms

Song, Jiaming, Zhou, Linqi

arXiv.org Artificial IntelligenceMar-11-2025

Recent years have seen significant advancements in foundation models through generative pre-training, yet algorithmic innovation in this space has largely stagnated around autoregressive models for discrete signals and diffusion models for continuous signals. This stagnation creates a bottleneck that prevents us from fully unlocking the potential of rich multi-modal data, which in turn limits the progress on multimodal intelligence. We argue that an inference-first perspective, which prioritizes scaling efficiency during inference time across sequence length and refinement steps, can inspire novel generative pre-training algorithms. Using Inductive Moment Matching (IMM) as a concrete example, we demonstrate how addressing limitations in diffusion models' inference process through targeted modifications yields a stable, single-stage algorithm that achieves superior sample quality with over an order of magnitude greater inference efficiency.

algorithm, arxiv preprint arxiv, diffusion model, (10 more...)

arXiv.org Artificial Intelligence

2503.07154

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Learning End-to-End Channel Coding with Diffusion Models

Kim, Muah, Fritschek, Rick, Schaefer, Rafael F.

arXiv.org Artificial IntelligenceSep-21-2023

The training of neural encoders via deep learning necessitates a differentiable channel model due to the backpropagation algorithm. This requirement can be sidestepped by approximating either the channel distribution or its gradient through pilot signals in real-world scenarios. The initial approach draws upon the latest advancements in image generation, utilizing generative adversarial networks (GANs) or their enhanced variants to generate channel distributions. In this paper, we address this channel approximation challenge with diffusion models, which have demonstrated high sample quality in image generation. We offer an end-to-end channel coding framework underpinned by diffusion models and propose an efficient training algorithm. Our simulations with various channel models establish that our diffusion models learn the channel distribution accurately, thereby achieving near-optimal end-to-end symbol error rates (SERs). We also note a significant advantage of diffusion models: A robust generalization capability in high signal-to-noise ratio regions, in contrast to GAN variants that suffer from error floor. Furthermore, we examine the trade-off between sample quality and sampling speed, when an accelerated sampling algorithm is deployed, and investigate the effect of the noise scheduling on this trade-off. With an apt choice of noise scheduling, sampling time can be significantly reduced with a minor increase in SER.

algorithm, diffusion model, pre-training algorithm, (16 more...)

arXiv.org Artificial Intelligence

2309.10505

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Distributive Pre-Training of Generative Modeling Using Matrix-Product States

Lin, Sheng-Hsuan, Kuijpers, Olivier, Peterhansl, Sebastian, Pollmann, Frank

arXiv.org Artificial IntelligenceJun-26-2023

Tensor networks have recently found applications in machine learning for both supervised learning and unsupervised learning. The most common approaches for training these models are gradient descent methods. In this work, we consider an alternative training scheme utilizing basic tensor network operations, e.g., summation and compression. The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation. The algorithm could be parallelized easily and only iterates through the dataset once. Hence, it serves as a pre-training algorithm. We benchmark the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks. Furthermore, we provide an interpretation of the algorithm as a compressed quantum kernel density estimation for the probability amplitude of input data.

artificial intelligence, feature map, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.14787

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

A Better Way to Pretrain Deep Boltzmann Machines

Neural Information Processing SystemsApr-6-2023, 12:26:19 GMT

We describe how the pre-training algorithm for Deep Boltzmann Machines (DBMs) is related to the pre-training algorithm for Deep Belief Networks and we show that under certain conditions, the pre-training procedure improves the variational lower bound of a two-hidden-layer DBM. Based on this analysis, we develop a different method of pre-training DBMs that distributes the modelling work more evenly over the hidden layers. Our results on the MNIST and NORB datasets demonstrate that the new pre-training algorithm allows us to learn better generative models.

deep boltzmann machine, pre-training algorithm, pretrain deep boltzmann machine, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback

A Better Way to Pretrain Deep Boltzmann Machines

Hinton, Geoffrey E., Salakhutdinov, Russ R.

Neural Information Processing SystemsFeb-14-2020, 23:56:54 GMT

artificial intelligence, deep boltzmann machine, machine learning, (4 more...)

Neural Information Processing Systems

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback